Hierarchical conditional random fields for web extraction

ABSTRACT

A method and system for labeling object information of an information page is provided. A labeling system identifies an object record of an information page based on the labeling of object elements within an object record and labels object elements based on the identification of an object record that contains the object elements. To identify the records and label the elements, the labeling system generates a hierarchical representation of blocks of an information page. The labeling system identifies records and elements within the records by propagating probability-related information of record labels and element labels through the hierarchy of the blocks. The labeling system generates a feature vector for each block to represent the block and calculates a probability of a label for a block being correct based on a score derived from the feature vectors associated with related blocks. The labeling system searches for the labeling of records and elements that has the highest probability of being correct.

BACKGROUND

Web pages accessible via the Internet contain a vast amount ofinformation. A web page may contain information about various types ofobjects such as products, people, papers, organizations, and so on. Forexample, one web page may contain a product review of a certain model ofcamera, and another web page may contain an advertisement offering tosell that model of camera at a certain price. As another example, oneweb page may contain a journal article, and another web page may be thehomepage of an author of the journal article. A person who is searchingfor information about an object may need information that is containedin different web pages. For example, a person who is interested inpurchasing a certain camera may want to read reviews of the camera andto determine who is offering the camera at the lowest price.

To obtain such information, a person would typically use a search engineto find web pages that contain information about the camera. The personwould enter a search query that may include the manufacturer and modelnumber of the camera. The search engine then identifies web pages thatmatch the search query and presents those web pages to the user in anorder that is based on how relevant the content of the web page is tothe search query. The person would then need to view the various webpages to find the desired information. For example, the person may firsttry to find web pages that contain reviews of the camera. After readingthe reviews, the person may then try to locate a web page that containsan advertisement for the camera at the lowest price.

Web search systems have not been particularly helpful to users trying tofind information about a specific product because of the difficulty inaccurately identifying objects and their attributes from web pages. Webpages often allocate a record for each object that is to be displayed.For example, a web page that lists several cameras for sale may includea record for each camera. Each record contains attributes of the objectsuch as an image of the camera, its make and model, and its price. Webpages contain a wide variety of layouts of records and layouts ofattributes within records. Systems identifying records and theirattributes from web pages are typically either template-dependent ortemplate-independent. Template-dependent systems may have templates forboth the layout of records on web pages and the layout of attributeswithin a record. Such a system finds record templates that matchportions of a web page and then finds attribute templates that match theattributes of the record. Template-independent systems, in contrast,typically try to identify whether a web page is a list page (i.e.,listing multiple records) or a detail page (i.e., a single record). Thetemplate-independent system then tries to identify records frommeta-data of the web page (e.g., tables) based on this distinction. Suchsystems may then use various heuristics to identify the attributes ofthe records.

A difficulty with these systems is that records are often incorrectlyidentified. An error in the identification of a record will propagate tothe identification of attributes. As a result, the overall accuracy islimited by the accuracy of the identification of records. Anotherdifficulty with these systems is that they typically do not take intoconsideration the semantics of the content of a portion that isidentified as a record. A person, in contrast, can easily identifyrecords by factoring in the semantics of their content.

SUMMARY

A method and system for labeling object information of an informationpage is provided. A labeling system identifies an object record of aninformation page based on the labeling of object elements within anobject record and labels object elements based on the identification ofan object record that contains the object elements. Thus, the labelingsystem jointly identifies records and label elements in a way that ismore effective than if performed separately. To jointly identify therecords and label the elements, the labeling system generates ahierarchical representation of blocks of an information page with blocksbeing represented as vertices of the hierarchical representation. Thelabeling system identifies records and elements within the records bypropagating probability-related information of record labels and elementlabels through the hierarchy of the blocks. The labeling systemgenerates a feature vector for each block to represent the block andcalculates a probability of a label for a block being correct based on ascore derived from the feature vectors associated with related blocks.The labeling system may use a propagation technique to propagate theeffect of a labeling of one block to the other blocks within thehierarchical representation. The labeling system searches for thelabeling of records and elements that has the highest probability ofbeing correct.

A labeling system uses a hierarchical conditional random fields (“CRF”)technique to label the object elements.

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This Summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intended tobe used as an aid in determining the scope of the claimed subjectmatter.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram that illustrates an example web page and itscorresponding vision tree.

FIG. 2 is a diagram that illustrates the graphical structure of thehierarchical conditional random fields of a vision tree.

FIG. 3 is a diagram that represents the junction tree corresponding toFIG. 2.

FIG. 4 is a block diagram that illustrates components of the labelingsystem in one embodiment.

FIG. 5 is a flow diagram that illustrates the processing of the labeldocuments component of the labeling system in one embodiment.

FIG. 6 is a flow diagram that illustrates the processing of the generatejunction tree component of the labeling system in one embodiment.

FIG. 7 is a flow diagram that illustrates the processing of thepropagate beliefs component of the labeling system in one embodiment.

FIG. 8 is a flow diagram that illustrates the processing of the collectcomponent of the labeling system in one embodiment.

FIG. 9 is a flow diagram that illustrates the processing of thedistribute component of the labeling system in one embodiment.

FIG. 10 is a flow diagram that illustrates the processing of the learnparameters component of the labeling system in one embodiment.

DETAILED DESCRIPTION

A method and system for labeling object information of an informationpage is provided. In one embodiment, a labeling system identifies anobject record of an information page, such as a web page, based on thelabeling of object elements within an object record and labels objectelements based on the identification of an object record that containsthe object elements. To identify the records and label the elements, thelabeling system generates a hierarchical representation of blocks of aweb page with blocks being represented as vertices of the hierarchicalrepresentation. A block may represent a collection of information of aweb page that is visually related. a root block represents the entireweb page, a leaf block is an atomic unit (such as an element of arecord), and inner blocks represent collections of their child blocks.The labeling system identifies records and elements within the recordsby propagating probability-related information of record labels andelement labels through the hierarchy of the blocks. The labeling systemgenerates a feature vector for each block to represent the block. Thelabeling system calculates a probability of a label for a block beingcorrect based on a score derived from the feature vectors associatedwith related blocks. A related block may be a block that is either aparent block or a nearest sibling block within the hierarchicalrepresentation. A collection of related blocks is referred to as a“clique.” The labeling system may define feature functions that generatescores, which are combined to give an overall score for a label for ablock. A feature function may evaluate the features of a block itself,the combined features of a block and a related block, and the combinedfeatures of a block and all its related blocks. The labeling system mayuse a propagation technique to propagate the effect of a labeling of oneblock to the other blocks within the hierarchical representation. Thelabeling system searches for the labeling of records and elements thathas the highest probability of being correct.

In one embodiment, the labeling system uses a vision-based pagesegmentation (“VIPS”) technique to generate a hierarchicalrepresentation of blocks of a web page. One VIPS technique is describedin Cai, D., Yu, S., Wen, J., and Ma, W., “VIPS: A Vision-Based PageSegmentation Algorithm,” Microsoft Technical Report, MSR-TR-2003-79,2003, which is hereby incorporated by reference. A VIPS technique usespage layout features (e.g., font, color, and size) to construct a“vision tree” for a web page. The technique identifies nodes from theHTML tag tree and identifies separators (e.g., horizontal and verticallines) between the nodes. The technique creates a vision tree that has avertex, referred to as a block, for each identified node. Thehierarchical representation of the blocks can effectively keep relatedblocks together while separating semantically different blocks. FIG. 1is a diagram that illustrates an example web page and its correspondingvision tree. The web page 110 includes data records 120 and 130, whichcorrespond to blocks 160 and 170, respectively, of vision tree 150. Thevision tree includes leaf blocks 162 and 163 corresponding to image 122and description 123 and leaf blocks 172, 174, and 175, corresponding toimage 132 and descriptions 134 and 135.

In one embodiment, the labeling system performs a joint optimization forrecord identification and element (or attribute) labeling. The labelingsystem generates a feature vector for each block. The feature vectorsare represented as X={X₀, X₁, . . . , X_(N-1)} where X_(i) representsthe feature vector for block i. The labeling system represents thelabels of the blocks as the vectors Y={Y₀, Y₁, . . . , Y_(N-1)} whereY_(i) represents the label for block i. The goal of the labeling systemis to calculate the maximum posterior probability of Y and extract datafrom the assignment as represented by the following:

y*=arg maxp(y|x)

where y* represents the labeling with the highest probability for theblock represented by the feature vector x. The labeling system thusprovides a uniform framework for record identification and attributelabeling. As a result, records that are wrongly identified and causeattribute labeling to perform badly will have a low probability and thusnot be selected as the correct labeling. Furthermore, since recordidentification and attribute labeling are conducted simultaneously, thelabeling system can leverage the attribute labels for a better recordidentification.

In one embodiment, the labeling system uses a hierarchical conditionalrandom fields (“CRF”) technique to label the records and elements of thevision tree representing a web page. CRFs are Markov random fieldsglobally conditioned on the observations X. The graph G=(V, E) is anundirected graph of CRFs. According to CRFs, the conditionaldistribution of the labels y given the observations x has the formrepresented by the following:

$\begin{matrix}{{p\left( y \middle| x \right)} = {\frac{1}{Z(x)}{\prod\limits_{c \in C}\; {\phi_{c}\left( {c,\left. y \right|_{c},x} \right)}}}} & (1)\end{matrix}$

where C represents a set of cliques in graph G, y|_(c) represents thecomponents of y associated with clique c, φ_(c) represents a potentialfunction defined on y|_(c), and Z is a normalization factor.

FIG. 2 is a diagram that illustrates the graphical structure of thehierarchical conditional random fields of a vision tree. The circlesrepresent inner blocks and the rectangles represent leaf blocks. Theobservations that are globally conditioned are not shown. The labelingsystem assumes that every inner block contains at least two childblocks. If an inner block does not contain two child blocks, thelabeling system replaces the parent block with the child block. Thecliques of the graph in FIG. 2 are its vertices, edges, and triangles.The labeling system represents the conditional probability of Equation 1as follows:

$\begin{matrix}{{p\left( y \middle| x \right)} = {\frac{1}{Z(x)}{\exp \begin{pmatrix}{{\sum\limits_{v,k}{\mu_{k}{g_{k}\left( {v,\left. y \right|_{v},x} \right)}}} +} \\{{\sum\limits_{e,k}{\lambda_{k}{f_{k}\left( {e,\left. y \right|_{e},x} \right)}}} +} \\{\sum\limits_{t,k}{\gamma_{k}{h_{k}\left( {t,\left. y \right|_{t},x} \right)}}}\end{pmatrix}}}} & (2)\end{matrix}$

where g_(k), f_(k), and h_(k) represent feature functions defined onthree types of cliques (i.e., vertex, edge, and triangle, respectively);μ_(k), λ_(k), and γ_(k) represent the corresponding weights; vεV; eεE;and t is a triangle, which is also a maximum clique. Although thefeature functions generate real values, the labeling system may beimplemented so that they are Boolean, that is, true if the featurematches and false otherwise. An example feature function is representedby the following:

$\begin{matrix}{{g_{k}\left( {y_{i},x} \right)} = \left\{ \begin{matrix}{{true},} & {{{if}\mspace{14mu} y_{i}} = {{Name}\mspace{14mu} {and}\mspace{14mu} x\mspace{14mu} {is}\mspace{14mu} {capitalized}}} \\{{false},} & {otherwise}\end{matrix} \right.} & (3)\end{matrix}$

which means that if the content of vertex x is capitalized, then thefunction returns a value of true when the label y_(i) is “Name.”

The labeling system determines weights for the feature functions usingthe training data D={(y′,x′)}_(i=1) ^(N) with the empirical distribution{tilde over (p)}(x,y) where N is the number of sets of labeledobservations in the training data. The labeling system represents thelog-likelihood of {tilde over (p)}(x,y) with respect to a conditionalmodel p(y|x,Θ) according to the following:

$\begin{matrix}{{L(\Theta)} = {\prod\limits_{x,y}{{\overset{\sim}{p}\left( {x,y} \right)}\log \; {p\left( {\left. y \middle| x \right.,\Theta} \right)}}}} & (4)\end{matrix}$

where Θ={μ₁, μ₂, . . . ; λ₁, λ₂, . . . ; γ₁, γ₂} represents the set ofweights for the feature functions. The labeling system identifies theweights as the values that optimize the concave log-likelihood function.The labeling system may use various techniques to determine the weights.For example, the labeling system can use techniques used in othermaximum-entropy models as described in Lafferty, J., McCallum, A., &Pereira, F., “Conditional Random Fields: Probabilistic Models forSegmenting and Labeling Sequence Data,” in Proc. ICML, 2001. Thelabeling system may also use a gradient-based L-BFGS as described inLiu, D. C., & Nocedal, J., “On The Limited Memory BFGS Method for LargeScale Optimization,” Mathematical Programming 45, pp. 503-528, 1989. Thegradient-based model represents each element of the gradient vector asfollows:

$\begin{matrix}{\frac{\partial{L(\Theta)}}{\partial\lambda_{k}} = {{E_{\overset{\sim}{p}{({x,y})}}\left\lbrack f_{k} \right\rbrack} - {E_{p{({{y|x},\Theta})}}\left\lbrack f_{k} \right\rbrack}}} & (5)\end{matrix}$

where E_({tilde over (p)}(y,x))[f_(k)] is the expectation with respectto the empirical distribution and E_(p(y|x,Θ))[f_(k)] is the expectationwith respect to the conditional model distribution. For example, theexpectations of f_(k) are:

$\begin{matrix}{{{E_{\overset{\sim}{p}{({x,y})}}\left\lbrack f_{k} \right\rbrack} = {\sum\limits_{x,y}{{\overset{\sim}{p}\left( {x,y} \right)}{\sum\limits_{e \in E}{\sum\limits_{y_{i},y_{j}}{f_{k}\left( {e,y_{i},y_{j},x} \right)}}}}}}{{E_{p{({{y|x},\Theta})}}\left\lbrack f_{k} \right\rbrack} = {\sum\limits_{x}{{\overset{\sim}{p}(x)}{\sum\limits_{e \in E}{\sum\limits_{y_{i},y_{j}}{{p\left( {y_{i},\left. y_{j} \middle| x \right.} \right)}{f_{k}\left( {e,y_{i},y_{j},x} \right)}}}}}}}} & (4)\end{matrix}$

where e=(i, j) is an edge.

The labeling system calculates the expectation for the empiricaldistribution once and calculates the marginal probabilities for themodel distribution during each iteration while solving the optimizationproblem of Equation 4. Since the graph of FIG. 2 is a chordal graph, thelabeling system performs inference to calculate the marginalprobabilities using a junction tree algorithm. A junction tree algorithmis described in Cowell, R., Dawid, A., Lauritzen, S., and Spiegelhalter,D., “Probabilistic Networks and Expert Systems,” Springer-Verlag, 1999.A junction tree algorithm constructs the junction tree, initializespotentials of the vertices of the junction tree, and propagates beliefsamong the vertices. FIG. 3 represents the junction tree corresponding toFIG. 2. The ellipses represent cliques, and the rectangles representseparators. All the cliques have size 3, since the maximum clique inFIG. 2 is size 3. The labeling system builds the junction tree byobtaining a set of maximal elimination cliques using node elimination.The labeling system then builds a complete cluster graph with weightsover cliques. The labeling system selects the spanning tree with themaximum weight as the junction tree.

After the junction tree has been constructed, the labeling systeminitializes all the potentials of the junction tree to have a value of 1and multiplies the potential of a vertex, an edge, or a triangle intothe potential of any one clique node of T which covers its variables.The potential of a vertex v, an edge e, and a triangle t is representedby the following:

$\begin{matrix}{{{{\phi_{v}\left( {\left. y \right|_{v},x} \right)} = {\exp \left( {\sum\limits_{k}{\mu_{k}{g_{k}\left( {v,\left. y \right|_{v},x} \right)}}} \right)}},{{\phi_{e}\left( {\left. y \right|_{e},x} \right)} = {\exp \left( {\sum\limits_{k}{\lambda_{k}{f_{k}\left( {e,\left. y \right|_{e},x} \right)}}} \right)}},{and}}{{\phi_{t}\left( {\left. y \right|_{t},x} \right)} = {{\exp \left( {\sum\limits_{k}{\gamma_{k}{h_{k}\left( {t,\left. y \right|_{t},x} \right)}}} \right)}.}}} & (5)\end{matrix}$

The labeling system in one embodiment uses a two-phase schedulealgorithm to propagate beliefs within the junction tree. A two-phaseschedule algorithm is described in Jensen, F., Lauritzen, S., andOlesen, K., “Bayesian Updating in Causal Probabilistic Networks by LocalComputations,” Computational Statistics Quarterly, 4:269-82, 1990. Thealgorithm uses a collection and distribution phase to calculate thepotentials for the cliques and separators. One skilled in the art willappreciate that the labeling system can use other message passingtechniques to propagate beliefs. Upon completion of the distributionphase, the potentials represent marginal potentials that are used by thelabeling system to guide finding the solution for the weights that bestmatch the training data.

After learning the weights, the labeling system uses the weights to findlabels for the blocks of web pages. The labeling system uses the VIPStechnique, the junction tree algorithm, and a modified two-phaseschedule algorithm to find the best labeling. The labeling systemgenerates a vision tree from the web page and generates a junction tree.The labeling system modifies the two-phase schedule algorithm byreplacing its summations with maximizations. The best labeling for ablock is found from the potential of any clique that contains the block.

FIG. 4 is a block diagram that illustrates components of the labelingsystem in one embodiment. The labeling system 410 is connected to websites 420 via communications link 430. The labeling system includes adocument store 411 and a training data store 412. The document storecontains web pages that may be collected from the various web sites. Thetraining data store contains vision trees generated from the web pagesalong with the correct labeling of the records and elements of the webpages. The labeling system also includes a learn parameters component413 and a label documents component 414. The learn parameters componentinputs the training data and generates the weights for the featurefunctions. The label documents component inputs a web page andidentifies the correct labeling for the blocks within the web page. Thelabeling system also includes auxiliary components such as a generatejunction tree component 415, a propagate beliefs component 416, acollect component 417, and a distribute component 418 as described belowin detail.

The computing devices on which the labeling system may be implementedmay include a central processing unit, memory, input devices (e.g.,keyboard and pointing devices), output devices (e.g., display devices),and storage devices (e.g., disk drives). The memory and storage devicesare computer-readable media that may contain instructions that implementthe labeling system. In addition, the data structures and messagestructures may be stored or transmitted via a data transmission medium,such as a signal on a communications link. Various communications linksmay be used, such as the Internet, a local area network, a wide areanetwork, or a point-to-point dial-up connection.

The labeling system may be used in various operating environments thatinclude personal computers, server computers, hand-held or laptopdevices, multiprocessor systems, microprocessor-based systems,programmable consumer electronics, network PCs, minicomputers, mainframecomputers, distributed computing environments that include any of theabove systems or devices, and the like.

The labeling system may be described in the general context ofcomputer-executable instructions, such as program modules, executed byone or more computers or other devices. Generally, program modulesinclude routines, programs, objects, components, data structures, and soon that perform particular tasks or implement particular abstract datatypes. Typically, the functionality of the program modules may becombined or distributed as desired in various embodiments.

FIG. 5 is a flow diagram that illustrates the processing of the labeldocuments component of the labeling system in one embodiment. The labeldocuments component is passed a web page and returns an assignment oflabels for the blocks of the web page. In block 501, the componentgenerates a vision tree for the web page. In block 502, the componentinvokes the generate junction tree component to generate a junction treebased on the vision tree. In block 503, the component invokes thepropagate beliefs component to propagate the beliefs for the assignmentsof the labels to the vertices of the junction tree. In block 504, thecomponent assigns the labels with the highest probability to the blocksof the web page and then completes.

FIG. 6 is a flow diagram that illustrates the processing of the generatejunction tree component of the labeling system in one embodiment. Thecomponent is passed a vision tree and generates a junction tree asdescribed in Cowell, R., Dawid, A., Lauritzen, S., and Spiegelhalter,D., “Probabilistic Networks and Expert Systems,” Springer-Verlag, 1999.In block 601, the component orders the nodes of the vision tree. Inblock 602, the component identifies the elimination cliques using nodeelimination. In block 603, the component generates a cluster graph fromthe elimination cliques. In block 604, the component adds weights to theedges based on the probabilities. In block 605, the component identifiesa spanning tree with the maximum weight. The identified spanning tree isthe junction tree. The component then returns.

FIG. 7 is a flow diagram that illustrates the processing of thepropagate beliefs component of the labeling system in one embodiment.The component implements the two-phase schedule algorithm by invokingthe collect component in block 701 passing the root node of the junctiontree and then invoking the distribute component in block 702 passing theroot node of the junction tree. The collect component and the distributecomponent are recursive routines that collect and distribute thepotentials of the nodes of the junction tree. The component thenreturns.

FIG. 8 is a flow diagram that illustrates the processing of the collectcomponent of the labeling system in one embodiment. The componentrecursively invokes itself for each child clique of the passed clique tocollect the potentials from the child cliques. In block 801, thecomponent calculates the potential of the passed clique. In blocks802-805, the component loops selecting each child clique of the passedclique. In block 802, the component selects the next child clique. Indecision block 803, if all the child cliques have already been selected,then the component returns the accumulated potential, else the componentcontinues at block 804. In block 804, the component invokes the collectcomponent recursively passing the selected child clique. In block 805,the component accumulates the product of the potential of the passedclique with the potential provided by the selected child clique and thenloops to block 802 to select the next child clique.

FIG. 9 is a flow diagram that illustrates the processing of thedistribute component of the labeling system in one embodiment. Thecomponent is passed a clique along with a potential to be distributed tothat clique. In block 901, the component calculates a new potential forthe clique factoring in the passed potential. In blocks 902-904, thecomponent loops selecting each child clique and recursively invokingitself. In block 902, the component selects the next child clique. Indecision block 903, if all the child cliques have already been selected,then the component returns, else the component continues at block 904.In block 904, the component recursively invokes the distribute componentpassing the child clique and then loops to block 902 to select the nextchild clique.

FIG. 10 is a flow diagram that illustrates the processing of the learnparameters component of the labeling system in one embodiment. The learnparameters component inputs the training data and learns the weights forthe feature functions. In block 1001, the component inputs the trainingdata. In block 1002, the component calculates the expectations based onthe training data. In block 1003, the component generates a junctiontree for each web page of the training data by invoking the generatejunction tree component for each web page. In block 1004, the componentinitializes potentials of the junction trees. In blocks 1005-1010, thecomponent loops selecting new weights until the weights converge on asolution. In block 1005, the component selects the next junction tree.In decision block 1006, if all the junction trees have already beenselected, then the component continues at block 1008, else the componentcontinues at block 1007. In block 1007, the component invokes thepropagate beliefs component to propagate the beliefs for the selectedjunction tree and then loops to block 1005 to select the next junctiontree. In block 1008, the component calculates a differential between theexpectation based on the propagated beliefs and the expectation based onthe training data. In decision block 1009, if the differentialapproaches zero, then the component returns with a solution, else thecomponent continues at block 1010. In block 1010, the component adjuststhe weights of the feature functions in the direction of the minimumgradient descent and then loops to block 1005 to propagate beliefs ofthe junction tree with the new weights.

Although the subject matter has been described in language specific tostructural features and/or methodological acts, it is to be understoodthat the subject matter defined in the appended claims is notnecessarily limited to the specific features or acts described above.Rather, the specific features and acts described above are disclosed asexample forms of implementing the claims. In particular, thetwo-dimensional CRF technique may be used to label any type ofobservations that have a two-dimensional relationship. Accordingly, theinvention is not limited except as by the appended claims.

1-20. (canceled)
 21. A method performed by a computing device with aprocessor and memory for labeling observations, the method comprising:receiving observations having hierarchical relationships represented bya graph having vertices representing observations and edges representingrelationships, a collection of related vertices being a clique, a cliquebeing a subset of vertices of the graph in which each pair of distinctvertices in the subset is joined by an edge; storing the receivedobservations in the memory; determining by the computing device alabeling for the observations using a conditional random fieldstechnique that factors in the hierarchical relationships, a conditionalprobability of a label for a given observation being based on featurefunctions for a vertex clique, an edge clique, and a triangle clique forthe label; and storing by the computing device the labeling for theobservations.
 22. The method of claim 21 wherein the observations arerepresented as a tree of observation vertices and the determiningincludes identifying a hierarchy of cliques of observation verticeswithin the tree and calculating a probability for sets of labels basedon probabilities derived from features of components of the cliques thatcontain the observation vertices.
 23. The method of claim 22 wherein thecomponents of a clique include the edges and vertices of the clique. 24.The method of claim 22 wherein the calculating of the probability for aset of labels includes generating a junction tree of the cliques andpropagating a belief to the cliques of the junction tree.
 25. The methodof claim 24 wherein the beliefs are propagated using a collection phaseand a distribution phase.
 26. The method of claim 21 including derivingweights for feature functions based on training data and wherein thedetermining includes calculating a probability for a set of labels basedon the training data.
 27. The method of claim 26 wherein the derivingincludes optimizing a log-likelihood function based on the trainingdata.
 28. The method of claim 26 wherein the optimizing uses agradient-based L-BFGS technique.
 29. The method of claim 21 wherein thedetermining of the labeling includes propagating probability-relatedcalculations from observation to observation.
 30. A computer-readablestorage medium containing instructions for controlling a computingdevice to identify object records and object elements of a web page, bya method comprising: receiving a hierarchical representation of blocksof the web page, each block representing an object record or an objectelement, the blocks represented by observations having hierarchicalrelationships represented by a graph having vertices representingobservations and edges representing relationships, a collection ofrelated vertices being a clique, a clique being a subset of vertices ofthe graph in which each pair of distinct vertices in the subset isjoined by an edge; and applying a hierarchical conditional random fieldstechnique to jointly identify a set of record labels and element labelsfor the blocks based on the hierarchical relationship of the blocks ofthe web page, the applying including identifying the labels uses aconditional random fields technique that factors in the hierarchicalrelationships, a conditional probability of a label for a givenobservation being based on feature functions for a vertex clique, anedge clique, and a triangle clique for the label.
 31. Thecomputer-readable storage medium of claim 30 wherein the observationsare represented as a tree of observation vertices and the identifyingincludes identifying a hierarchy of cliques of observation verticeswithin the tree and calculating a probability for sets of labels basedon probabilities derived from features of components of the cliques thatcontain the observation vertices.
 32. The computer-readable storagemedium of claim 31 wherein the calculating of the probability for a setof labels includes generating a junction tree of the cliques andpropagating a belief to the cliques of the junction tree.
 33. Thecomputer-readable storage medium of claim 32 wherein the beliefs arepropagated using a collection phase and a distribution phase.
 34. Thecomputer-readable storage medium of claim 30 including deriving weightsfor feature functions based on training data and wherein the identifyingincludes calculating a probability for a set of labels based on thetraining data.
 35. The computer-readable storage medium of claim 30wherein the identifying of the labeling includes propagatingprobability-related calculations from observation to observation.
 36. Acomputing device for labeling observations, comprising: a memory storingcomputer-executable instructions that: receive observations havinghierarchical relationships represented by a graph having verticesrepresenting observations and edges representing relationships, acollection of related vertices being a clique; determine a labeling forthe observations using a conditional random fields technique thatfactors in the hierarchical relationships, a conditional probability ofa label for a given observation being based on feature functions for avertex clique, an edge clique, and a triangle clique for the label; anda processor for executing the computer-executable instructions stored inthe memory.
 37. The computing device of claim 36 wherein theobservations are represented as a tree of observation vertices and thedetermination includes identification of a hierarchy of cliques ofobservation vertices within the tree and calculation of a probabilityfor sets of labels based on probabilities derived from features ofcomponents of the cliques that contain the observation vertices.
 38. Thecomputing device of claim 37 wherein the components of a clique includethe edges and vertices of the clique.
 39. The computing device of claim36 wherein a clique is a subset of vertices of the graph in which eachpair of distinct vertices in the subset is joined by an edge.
 40. Thecomputing device of claim 36 including deriving weights for featurefunctions based on training data and wherein the determination includescalculation of a probability for a set of labels based on the trainingdata.